Improving the AUC of Probabilistic Estimation Trees

نویسندگان

  • César Ferri
  • Peter A. Flach
  • José Hernández-Orallo
چکیده

In this work we investigate several issues in order to improve the performance of probabilistic estimation trees (PETs). First, we derive a new probability smoothing that takes into account the class distributions of all the nodes from the root to each leaf. Secondly, we introduce or adapt some new splitting criteria aimed at improving probability estimates rather than improving classification accuracy, and compare them with other accuracy-aimed splitting criteria. Thirdly, we analyse the effect of pruning methods and we choose a cardinalitybased pruning, which is able to significantly reduce the size of the trees without degrading the quality of the estimates. The quality of probability estimates of these three issues is evaluated by the 1-vs-1 multi-class extension of the Area Under the ROC Curve (AUC) measure, which is becoming widespread for evaluating probability estimators, ranking of predictions in particular.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decision Trees for Ranking: Effect of new smoothing methods, new splitting criteria and simple pruning methods

In this work we investigate several issues in order to improve the performance of probabilistic estimation trees (PETs). First, we derive a new probability smoothing that takes into account the class distributions of all the nodes from the root to each leaf. This enhances probability estimations with respect to other previous approaches without smoothing or with Laplace correction. Secondly, we...

متن کامل

Probabilistic analysis of the asymmetric digital search trees

In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...

متن کامل

Learning Näıve Bayes Tree for Conditional Probability Estimation

Näıve Bayes Tree uses decision tree as the general structure and deploys näıve Bayesian classifiers at leaves. The intuition is that näıve Bayesian classifiers work better than decision trees when the sample data set is small. Therefore, after several attribute splits when constructing a decision tree, it is better to use näıve Bayesian classifiers at the leaves than to continue splitting the a...

متن کامل

An Empirical and Formal Analysis of Decision Trees for Ranking

Decision trees are known to be good classifiers but less good rankers. A few methods have been proposed to improve their performance in terms of AUC, along with first empirical evidence showing their effectiveness. The goal of this paper is twofold. First, by replicating and extending previous empirical studies, we not only improve the understanding of earlier results but also correct implicit ...

متن کامل

Many Are Better Than One: Improving Probabilistic Estimates from Decision Trees

Decision trees, a popular choice for classification, have their limitation in providing probability estimates, requiring smoothing at the leaves. Typically, smoothing methods such as Laplace or m-estimate are applied at the decision tree leaves to overcome the systematic bias introduced by the frequency-based estimates. In this work, we show that an ensemble of decision trees significantly impr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003